Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Rc<T>::deref and Arc<T>::deref zero-cost #132553

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

EFanZh
Copy link
Contributor

@EFanZh EFanZh commented Nov 3, 2024

Currently, Rc<T> and Arc<T> store pointers to RcInner<T> and ArcInner<T>. This PR changes the pointers so that they point to T directly instead.

This is based on the assumption that we access the T value more frequently than accessing reference counts. With this change, accessing the data can be done without offsetting pointers from RcInner<T> and ArcInner<T> to their contained data. This change might also enables some possibly useful future optimizations, such as:

  • Convert &[Rc<T>] into &[&T] within O(1) time.
  • Convert &[Rc<T>] into Vec<&T> utilizing memcpy.
  • Convert &Option<Rc<T>> into Option<&T> without branching.
  • Make Rc<T> and Arc<T> FFI compatible types where T: Sized.

@rustbot
Copy link
Collaborator

rustbot commented Nov 3, 2024

r? @jhpratt

rustbot has assigned @jhpratt.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Nov 3, 2024
@EFanZh EFanZh force-pushed the zero-cost-rc-arc-deref branch from b283c44 to ae36f44 Compare November 3, 2024 09:14
@rust-log-analyzer

This comment has been minimized.

@marmeladema
Copy link
Contributor

Would it potentially enable those types to have an ffi compatible ABI? So that they could be returned and passed directly from /to ffi function, like Box?

@rust-log-analyzer

This comment has been minimized.

@EFanZh
Copy link
Contributor Author

EFanZh commented Nov 3, 2024

Would it potentially enable those types to have an ffi compatible ABI? So that they could be returned and passed directly from /to ffi function, like Box?

I think in theory it is possible, at least for sized types, but I am not familiar with how to formally make it so.

@EFanZh EFanZh force-pushed the zero-cost-rc-arc-deref branch from ae36f44 to 0d6165f Compare November 3, 2024 11:21
@rust-log-analyzer

This comment has been minimized.

@EFanZh EFanZh force-pushed the zero-cost-rc-arc-deref branch from 0d6165f to 98edd5b Compare November 3, 2024 13:06
@rust-log-analyzer

This comment has been minimized.

@jhpratt
Copy link
Member

jhpratt commented Nov 3, 2024

r? libs

@rustbot rustbot assigned joboet and unassigned jhpratt Nov 3, 2024
@EFanZh EFanZh force-pushed the zero-cost-rc-arc-deref branch from 98edd5b to 8beb51d Compare November 4, 2024 16:29
@rust-log-analyzer

This comment has been minimized.

@EFanZh EFanZh force-pushed the zero-cost-rc-arc-deref branch from 8beb51d to d7879fa Compare November 4, 2024 17:26
@rust-log-analyzer

This comment has been minimized.

@EFanZh EFanZh force-pushed the zero-cost-rc-arc-deref branch from d7879fa to 317aa0e Compare November 4, 2024 18:40
@joboet
Copy link
Member

joboet commented Nov 7, 2024

@EFanZh Is this ready for review? If so, please un-draft the PR.

@EFanZh
Copy link
Contributor Author

EFanZh commented Nov 7, 2024

@joboet: The source code part is mostly done, but I haven’t finished updating LLDB and CDB pretty printers. The CI doesn’t seem to run those tests.

@joboet
Copy link
Member

joboet commented Nov 8, 2024

No worries! I just didn't want to keep you waiting in case you had forgotten to change the state.
@rustbot author

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 8, 2024
@EFanZh EFanZh force-pushed the zero-cost-rc-arc-deref branch 3 times, most recently from f243654 to 1308bf6 Compare November 11, 2024 18:35
@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 23, 2025
@bors
Copy link
Contributor

bors commented Feb 23, 2025

⌛ Trying commit ae9240c with merge 1a76f3d...

bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 23, 2025
Make `Rc<T>::deref` and `Arc<T>::deref` zero-cost

Currently, `Rc<T>` and `Arc<T>` store pointers to `RcInner<T>` and `ArcInner<T>`. This PR changes the pointers so that they point to `T` directly instead.

This is based on the assumption that we access the `T` value more frequently than accessing reference counts. With this change, accessing the data can be done without offsetting pointers from `RcInner<T>` and `ArcInner<T>` to their contained data. This change might also enables some possibly useful future optimizations, such as:

- Convert `&[Rc<T>]` into `&[&T]` within O(1) time.
- Convert `&[Rc<T>]` into `Vec<&T>` utilizing `memcpy`.
- Convert `&Option<Rc<T>>` into `Option<&T>` without branching.
- Make `Rc<T>` and `Arc<T>` FFI compatible types where `T: Sized`.
@bors
Copy link
Contributor

bors commented Feb 23, 2025

☀️ Try build successful - checks-actions
Build commit: 1a76f3d (1a76f3df0b6373e760df2514a5af2587f3e01aff)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (1a76f3d): comparison URL.

Overall result: ❌✅ regressions and improvements - please read the text below

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.

mean range count
Regressions ❌
(primary)
0.3% [0.2%, 0.5%] 5
Regressions ❌
(secondary)
0.4% [0.3%, 0.8%] 8
Improvements ✅
(primary)
-0.6% [-1.5%, -0.2%] 8
Improvements ✅
(secondary)
-1.4% [-2.6%, -0.4%] 5
All ❌✅ (primary) -0.2% [-1.5%, 0.5%] 13

Max RSS (memory usage)

Results (primary 2.0%, secondary -0.9%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
5.3% [1.6%, 13.9%] 6
Regressions ❌
(secondary)
1.9% [1.9%, 1.9%] 1
Improvements ✅
(primary)
-3.0% [-5.2%, -1.7%] 4
Improvements ✅
(secondary)
-3.6% [-3.6%, -3.6%] 1
All ❌✅ (primary) 2.0% [-5.2%, 13.9%] 10

Cycles

Results (primary -1.2%, secondary -1.9%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-1.2% [-1.4%, -1.0%] 2
Improvements ✅
(secondary)
-1.9% [-2.4%, -1.5%] 2
All ❌✅ (primary) -1.2% [-1.4%, -1.0%] 2

Binary size

Results (primary 0.2%, secondary -0.9%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.5% [0.0%, 1.6%] 54
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.3% [-1.5%, -0.0%] 24
Improvements ✅
(secondary)
-0.9% [-6.9%, -0.1%] 46
All ❌✅ (primary) 0.2% [-1.5%, 1.6%] 78

Bootstrap: 772.144s -> 774.372s (0.29%)
Artifact size: 359.81 MiB -> 359.75 MiB (-0.02%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 23, 2025
@rust-log-analyzer

This comment has been minimized.

@rustbot rustbot added the has-merge-commits PR has merge commits, merge with caution. label Feb 24, 2025
@rustbot

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@EFanZh EFanZh force-pushed the zero-cost-rc-arc-deref branch from fab2460 to 320d0c5 Compare February 24, 2025 13:30
@rustbot rustbot removed the has-merge-commits PR has merge commits, merge with caution. label Feb 24, 2025
@EFanZh
Copy link
Contributor Author

EFanZh commented Feb 26, 2025

@rustbot ready

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Feb 26, 2025
@bors
Copy link
Contributor

bors commented Mar 7, 2025

☔ The latest upstream changes (presumably #138155) made this pull request unmergeable. Please resolve the merge conflicts.

@EFanZh EFanZh force-pushed the zero-cost-rc-arc-deref branch from 4074802 to fd02c08 Compare March 8, 2025 05:53
@bors
Copy link
Contributor

bors commented Mar 8, 2025

☔ The latest upstream changes (presumably #138208) made this pull request unmergeable. Please resolve the merge conflicts.

@EFanZh EFanZh force-pushed the zero-cost-rc-arc-deref branch from fd02c08 to 384ea40 Compare March 9, 2025 07:10
@EFanZh EFanZh force-pushed the zero-cost-rc-arc-deref branch from 384ea40 to 0bdb018 Compare March 9, 2025 07:18
@scottmcm
Copy link
Member

Neutral-ish on icounts, improved on cycles, and even shrinks optimized binaries? Nice.

//!
//! - Making reference-counting pointers have ABI-compatible representation as raw pointers so we
//! can use them directly in FFI interfaces.
//! - Converting `Option<Rc<T>>` to `Option<&T>` with a memory copy operation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this one should optimize to that already with what you've already written here, right?

You could consider adding a codegen test, like

use std::sync::Arc;

#[no_mangle]
pub fn option_arc_as_deref_is_nop(x: &Option<Arc<i32>>) -> Option<&i32> {
    // CHECK-LABEL: @option_arc_as_deref_is_nop(ptr
    // CHECK: %[[R:.+]] = load ptr, ptr %x
    // CHECK: ret ptr %[[R]]
    x.as_deref()
}

Comment on lines +362 to +368
impl Deref for RcLayout {
type Target = Layout;

fn deref(&self) -> &Self::Target {
&self.0
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, if "external" things should only use the inner layout field through this deref, would it be worth putting RcLayout in a separate module to enforce that with privacy?

(This is one of those places that really wants to be able to just do unsafe struct RcLayout(Layout); to enforce it that way...)

Comment on lines +370 to +373
trait RcLayoutExt {
/// Computes `RcLayout` at compile time if `Self` is `Sized`.
const RC_LAYOUT: RcLayout;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice that we only need one of these, since Rc and Arc can shared the constant 👍

Comment on lines +389 to +393
unsafe fn ref_counts_ptr_from_value_ptr(value_ptr: NonNull<()>) -> NonNull<RefCounts> {
const REF_COUNTS_OFFSET: usize = size_of::<RefCounts>();

unsafe { value_ptr.byte_sub(REF_COUNTS_OFFSET) }.cast()
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ymmv: since you need to cast in this function anyway, you could consider avoiding the need for the cast-to-unit in the callers of this by having this be something like

Suggested change
unsafe fn ref_counts_ptr_from_value_ptr(value_ptr: NonNull<()>) -> NonNull<RefCounts> {
const REF_COUNTS_OFFSET: usize = size_of::<RefCounts>();
unsafe { value_ptr.byte_sub(REF_COUNTS_OFFSET) }.cast()
}
unsafe fn ref_counts_ptr_from_value_ptr<T: ?Sized>(value_ptr: NonNull<T>) -> NonNull<RefCounts> {
unsafe { value_ptr.cast::<RefCounts>().sub(1) }
}

(That ought to simplify the MIR too, since byte_sub has to cast to NonNull<u8> then cast back again, but if you cast and can just sub you avoid that step. Of course the conversions like that are optimized out by LLVM anyway, but...)

Comment on lines +395 to +419
/// Get a pointer to the strong counter object in the same allocation with a value pointed to by
/// `value_ptr`.
///
/// # Safety
///
/// - `value_ptr` must point to a value object (can be uninitialized or dropped) that lives in a
/// reference-counted allocation.
unsafe fn strong_count_ptr_from_value_ptr(value_ptr: NonNull<()>) -> NonNull<UnsafeCell<usize>> {
const STRONG_OFFSET: usize = size_of::<RefCounts>() - mem::offset_of!(RefCounts, strong);

unsafe { value_ptr.byte_sub(STRONG_OFFSET) }.cast()
}

/// Get a pointer to the weak counter object in the same allocation with a value pointed to by
/// `value_ptr`.
///
/// # Safety
///
/// - `value_ptr` must point to a value object (can be uninitialized or dropped) that lives in a
/// reference-counted allocation.
unsafe fn weak_count_ptr_from_value_ptr(value_ptr: NonNull<()>) -> NonNull<UnsafeCell<usize>> {
const WEAK_OFFSET: usize = size_of::<RefCounts>() - mem::offset_of!(RefCounts, weak);

unsafe { value_ptr.byte_sub(WEAK_OFFSET) }.cast()
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Can you avoid doing manual layout calculations here?

If you converted to a NonNull<RefCounts> first, then &raw can just mention the field to get its pointer, rather than needing to offset_of and deal in raw bytes.

(I think that'd let you drop the repr(C) on RefCounts too, which would be nice. I don't think there should be a need for it -- I don't think any of the logic here really cares whether the strong or weak count is first in memory.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
perf-regression Performance regression. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.